[ENH] V1 → V2 API Migration - Tasks #1611

satvshr · 2026-01-09T09:06:07Z

Metadata

Reference Issue: stacks on [ENH] V1 → V2 API Migration - core structure #1576, fixes (towards [ENH] V1 → V2 API Migration #1575)
New Tests Added: No
Documentation Updated: No

ported functions to APIv1

codecov-commenter · 2026-01-09T09:13:33Z

Codecov Report

❌ Patch coverage is 39.76261% with 203 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.94%. Comparing base (99928f8) to head (1b19c08).

Files with missing lines	Patch %	Lines
openml/_api/resources/tasks.py	13.01%	147 Missing ⚠️
openml/_api/http/client.py	46.37%	37 Missing ⚠️
openml/_api/runtime/core.py	77.77%	6 Missing ⚠️
openml/_api/runtime/fallback.py	0.00%	6 Missing ⚠️
openml/_api/resources/datasets.py	77.77%	2 Missing ⚠️
openml/tasks/functions.py	60.00%	2 Missing ⚠️
openml/_api/__init__.py	75.00%	1 Missing ⚠️
openml/_api/config.py	96.87%	1 Missing ⚠️
openml/tasks/task.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1611      +/-   ##
==========================================
- Coverage   52.75%   44.94%   -7.82%     
==========================================
  Files          36       46      +10     
  Lines        4333     4508     +175     
==========================================
- Hits         2286     2026     -260     
- Misses       2047     2482     +435

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for more information, see https://pre-commit.ci

geetu040

From a high-level review, I noticed a few points that need adjustment:

Caching can likely be removed from the SDK, since these concerns should be handled by the base client.
I don't see the api_context being used in tasks/functions, so it's not clear to me how the SDK is actually using the new API interface here.
Instead of moving entire methods out of tasks/functions.py, it would be better to stick to the goal of minimal SDK changes while enabling v2 support.
API calls should be updated at the specific root functions (for example _get_task_description, OpenMLTask._download_split).
For listing tasks, please follow the approach discussed in #1575 comment.

geetu040 · 2026-01-13T18:05:26Z

examples/Advanced/fetch_evaluations_tutorial.py

 )

-print(evals_setups.head(10))
+print(evals_setups.head(10))


Let's keep these changes away from this PR. If there are some ruff errors in the existing code, they should be fixed in another PR which will probably get merged soon.

Accidentally had ran ruff format . on this branch, ruff PR getting merged solved these issues automatically though.

for more information, see https://pre-commit.ci

… tasks

geetu040

I have left some comments, please take a look and make sure the signature of all methods in TasksAPI, TasksV1 and TasksV2 stay same.

geetu040 · 2026-01-15T15:53:57Z

openml/_api/resources/base.py

+    def get(self, dataset_id: int) -> OpenMLDataset | tuple[OpenMLDataset, Response]: ...
+
+
+class TasksAPI(ResourceAPI, ABC):


why are the methods commented out?

I was going to remove them, if I add abstract methods they have to be for shared functions right? The only shared function right now is get.

if I add abstract methods they have to be for shared functions right?

they create blueprint of this resource, so one can look at the resource class to see which are the public methods and what do their inputs and outputs look like

these methods are expected to be implemented in all the child classes, so yes they are used for shared functions

The only shared function right now is get.

list, delete, ...?

list, delete, ...?

Not there for v2

still the base class should have these, in the v2 class just raise an exception or maybe skip it and the exception will be raised automatically

geetu040 · 2026-01-15T15:54:02Z

openml/_api/resources/base.py

+    # @abstractmethod
+    # def list_tasks(
+    #     self,
+    #     *,
+    #     task_type: TaskType | None = None,
+    #     offset: int | None = None,
+    #     size: int | None = None,
+    #     **filters: Any,
+    # ):


this method should simply be called list

Replaced the name in TasksV1 (where the function actually exists)

signatures should be same for all 3 classes as of now: #1611 (review)

As I said above, in terms of functionality only get matches, else i'd have done that.

I still don't understand

geetu040 · 2026-01-15T15:54:05Z

openml/_api/resources/tasks.py

+
+
+class TasksV1(TasksAPI):
+    @openml.utils.thread_safe_if_oslo_installed


you can remove this, it's not needed, it's related to cache and should be handled at client

geetu040 · 2026-01-15T15:54:08Z

openml/_api/resources/tasks.py

+    def get(
+        self,
+        task_id: int,
+        download_splits: bool = False,  # noqa: FBT002
+        **get_dataset_kwargs: Any,
+    ) -> OpenMLTask:
+        """Download OpenML task for a given task ID.
+
+        Downloads the task representation.
+
+        Use the `download_splits` parameter to control whether the splits are downloaded.
+        Moreover, you may pass additional parameter (args or kwargs) that are passed to
+        :meth:`openml.datasets.get_dataset`.
+
+        Parameters
+        ----------
+        task_id : int
+            The OpenML task id of the task to download.
+        download_splits: bool (default=False)
+            Whether to download the splits as well.
+        get_dataset_kwargs :
+            Args and kwargs can be used pass optional parameters to
+            :meth:`openml.datasets.get_dataset`.
+
+        Returns
+        -------
+        task: OpenMLTask
+        """
+        if not isinstance(task_id, int):
+            raise TypeError(f"Task id should be integer, is {type(task_id)}")
+
+        task = self._get_task_description(task_id)
+        dataset = get_dataset(task.dataset_id, **get_dataset_kwargs)
+        # List of class labels available in dataset description
+        # Including class labels as part of task meta data handles
+        #   the case where data download was initially disabled
+        if isinstance(task, (OpenMLClassificationTask, OpenMLLearningCurveTask)):
+            task.class_labels = dataset.retrieve_class_labels(task.target_name)
+        # Clustering tasks do not have class labels
+        # and do not offer download_split
+        if download_splits and isinstance(task, OpenMLSupervisedTask):
+            task.download_split()
+
+        return task
+
+    def _get_task_description(self, task_id: int) -> OpenMLTask:
+        result = self._http.get(f"task/{task_id}", return_response=True)
+
+        if isinstance(result, tuple):
+            task, _response = result
+        else:
+            task = result
+
+        return task


you should not copy this entirely from tasks/functions.py, only the specific part which loads the task object should be here, which would probably be

response = self._http.get(f"task/{task_id}") task = self._create_task_from_xml(response.text) return task

Did you mean to highlight the entire get function or only _get_task_description? Is this:

dataset = get_dataset(task.dataset_id, **get_dataset_kwargs) # List of class labels available in dataset description # Including class labels as part of task meta data handles # the case where data download was initially disabled if isinstance(task, (OpenMLClassificationTask, OpenMLLearningCurveTask)): task.class_labels = dataset.retrieve_class_labels(task.target_name) # Clustering tasks do not have class labels # and do not offer download_split if download_splits and isinstance(task, OpenMLSupervisedTask): task.download_split()

not useful? Why?

Did you mean to highlight the entire get function or only _get_task_description? Is this:

both

not useful? Why?

what should I look here? this is dataset related code.

what should I look here? this is dataset related code.

It is assigning attributes to the task object, don't you think that's useful?

ok I see, you are asking if this should live in the sdk or the resource class? if it can stay out of the resource class then it should

geetu040 · 2026-01-15T15:54:15Z

openml/_api/resources/tasks.py

+
+        return self.__list_tasks(api_call=api_call)
+
+    def __list_tasks(self, api_call: str) -> pd.DataFrame:  # noqa: C901, PLR0912


maybe use better helper functions like _create_list_url and _parse_list_response?

I am confused as to what youre trying to say here, do you mean I should transfer the functionalities of list (previously list_tasks) to _create_list_url, and rename __list_tasks to _parse_list_response?

this works fine but __list_tasks is not a good name for a helper function in this class.
I'd suggest you take a look at datasets PR, it does something similar: #1608

Oh ok only a rename? Will do.

geetu040 · 2026-01-15T15:54:21Z

openml/_api/resources/tasks.py

+    def get_tasks(
+        self,
+        task_ids: list[int],
+        download_data: bool | None = None,
+        download_qualities: bool | None = None,
+    ) -> list[OpenMLTask]:


keep this method in tasks/functions.py, because we are sticking to the rule "minimal sdk changes for v1/v2 compatibility"

Shouldn't I do the same for create_task and delete_task too?

Edit: Just saw your comment below :D

geetu040 · 2026-01-15T15:54:24Z

openml/_api/resources/tasks.py

+    def create_task(
+        self,
+        task_type: TaskType,
+        dataset_id: int,
+        estimation_procedure_id: int,
+        target_name: str | None = None,
+        evaluation_measure: str | None = None,
+        **kwargs: Any,
+    ) -> (


this should stay in tasks/fucntions.py

geetu040 · 2026-01-15T15:54:28Z

openml/_api/resources/tasks.py

+        bool
+            True if the deletion was successful. False otherwise.
+        """
+        return openml.utils._delete_entity("task", task_id)


I'll implement this in the base class that you can replace with later

As discussed today during the standup, makes sense

geetu040 · 2026-01-15T15:54:31Z

openml/_api/resources/tasks.py

+
+        return cls(**common_kwargs)
+
+    def list_task_types(self) -> list[dict[str, str | int | None]]:


is this used anywhere?

Nope, there is an endpoint for it though, same for get_task_type.

I'd say just remove it, since it suits a TaskType resource, though it's not needed anywhere now

I added tasktype as part of this PR too, there is only one endpointt for Tasks and 2 for TaskType, so I just added it into Tasks given theyre clubbed together in the docs for endpoints too

theyre clubbed together in the docs for endpoints too

in v2 if I remember correctly, this is not the case. anyways I still think they should be removed as they are not being used anywhere in the sdk

I was talking about v2 only, tasks and tasktype share the same header in the v2 docs page. tasktype endings are not being used anywhere so should tasktypes just not exist? Want me to remove it from this PR and park it into a draft PR if it is ever used (which will probably get lost over time), or should we let the endpoints and functions to de-serialize those endpoints exist for no reason?

geetu040 · 2026-01-15T15:54:36Z

openml/tasks/functions.py

-        raise OpenMLCacheException(f"Task file for tid {tid} not cached") from e
-
-
-def _get_estimation_procedure_list() -> list[dict[str, Any]]:


keep this method here and inside try to use the method list_estimation_procedures already implemented in evaluations/functions.py

list_estimation_procedures returns only the "oml:name" whereas _get_estimation_procedure_list requires more items, they make call the same API, and list_estimation_procedures may be somewhat of a subset of _get_estimation_procedure_list, but that does not mean it can be used inside _get_estimation_procedure_list

for more information, see https://pre-commit.ci

satvshr · 2026-01-16T19:27:34Z

@geetu040 I will explain the confusion I am facing to the best of my abilities, and I do feel communicating on the PR threads is yielding no results, hence I will just put everything here:

Over here you say the base class should have v1 functions like list, delete, and create. This means to me that all V1 related functions should be moved from functions.py to TasksV1 (from SDK to resources) and calls inside functions.py should be replaced with calls to api_context.backend.task.method. Over here, you say something similar, stating that TasksV1, V2, and TasksAPI should have the same function signatures.
Over here and here you mention that we should try to keep most of the code in SDK and not resources (contradictory to point 1).
Completely contrary to point 1 you mention get_tasks along with create_task and delete_task should stay in sdk over here.

I seem to be getting 2 contradictory messages from your end which is where the confusion arises.

geetu040 and others added 10 commits December 30, 2025 09:11

set up folder structure and base code

0159f47

Merge branch 'main' into migration

58e9175

Merge branch 'main' into migration

bdd65ff

fix pre-commit

52ef379

refactor

5dfcbce

implement cache_dir

2acbe99

refactor

af99880

Merge branch 'main' into pr/1576

74ab366

git commit --no-verify

17a7178

ported functions to APIv1

Merge branch 'main' into tasks

510b286

geetu040 mentioned this pull request Jan 9, 2026

[ENH] V1 → V2 API Migration #1575

Open

25 tasks

satvshr and others added 4 commits January 12, 2026 01:35

commiting latest cahnges

c2b9e1a

[pre-commit.ci] auto fixes from pre-commit.com hooks

056cf3a

for more information, see https://pre-commit.ci

Merge remote-tracking branch 'geetu040/migration' into tasks

fb1ff40

bug fixing

17ab23c

satvshr marked this pull request as ready for review January 12, 2026 15:29

geetu040 suggested changes Jan 13, 2026

View reviewed changes

satvshr added 3 commits January 14, 2026 16:51

commiting intermediate changes

e07ef73

removed caching

fb57a3e

removed uneccesary imports

8e041a4

satvshr marked this pull request as draft January 14, 2026 20:25

satvshr and others added 4 commits January 15, 2026 01:59

merge main

61ca98c

[pre-commit.ci] auto fixes from pre-commit.com hooks

e5dd2d9

for more information, see https://pre-commit.ci

small comments

202314e

Merge branch 'tasks' of https://github.com/satvshr/openml-python into…

3a2f1c4

… tasks

satvshr changed the title ~~[ENH] Tasks Migration~~ [ENH] V1 → V2 API Migration - Tasks Jan 15, 2026

geetu040 suggested changes Jan 15, 2026

View reviewed changes

satvshr and others added 2 commits January 15, 2026 23:42

Merge branch 'main' into tasks

a0c2267

[pre-commit.ci] auto fixes from pre-commit.com hooks

249efec

for more information, see https://pre-commit.ci

satvshr and others added 4 commits January 16, 2026 15:35

Merge branch 'main' into tasks

69dd3c6

requested changes

0d5ce53

requested changes

e15e892

[pre-commit.ci] auto fixes from pre-commit.com hooks

1b19c08

for more information, see https://pre-commit.ci

		def get(self, dataset_id: int) -> OpenMLDataset \| tuple[OpenMLDataset, Response]: ...


		class TasksAPI(ResourceAPI, ABC):



		class TasksV1(TasksAPI):
		@openml.utils.thread_safe_if_oslo_installed


		return self.__list_tasks(api_call=api_call)

		def __list_tasks(self, api_call: str) -> pd.DataFrame: # noqa: C901, PLR0912


		return cls(**common_kwargs)

		def list_task_types(self) -> list[dict[str, str \| int \| None]]:

		raise OpenMLCacheException(f"Task file for tid {tid} not cached") from e


		def _get_estimation_procedure_list() -> list[dict[str, Any]]:

Uh oh!

[ENH] V1 → V2 API Migration - Tasks #1611

Are you sure you want to change the base?

[ENH] V1 → V2 API Migration - Tasks #1611

Uh oh!

Conversation

satvshr commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Metadata

Uh oh!

codecov-commenter commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

geetu040 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geetu040 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

satvshr Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

satvshr Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

satvshr commented Jan 9, 2026 •

edited

Loading

codecov-commenter commented Jan 9, 2026 •

edited

Loading

satvshr Jan 15, 2026 •

edited

Loading

satvshr Jan 15, 2026 •

edited

Loading

satvshr Jan 15, 2026 •

edited

Loading

satvshr Jan 15, 2026 •

edited

Loading

satvshr commented Jan 16, 2026 •

edited

Loading